Multiple-Stage Knowledge Distillation

نویسندگان

چکیده

Knowledge distillation (KD) is a method in which teacher network guides the learning of student network, thereby resulting an improvement performance network. Recent research this area has concentrated on developing effective definitions knowledge and efficient methods transfer while ignoring ability To fully utilize potential improve efficiency, study proposes multiple-stage KD (MSKD) that allows students to learn delivered by multiple stages. The consists multi-exit architecture, imitate output at each exit. final classification achieved through ensemble learning. However, because results unreasonable gap between number parameters branch those as well mismatch capacity these two networks, we extend MSKD one-to-one method. experimental reveal proposed applied CIFAR100 Tiny ImageNet datasets exhibits good gain. enhancing changing style provides new insight into KD.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

Topic Distillation with Knowledge Agents

This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...

متن کامل

Reduced distillation models via stage aggregation

A method for deriving computationally efficient reduced nonlinear distillation models is proposed, which extends the aggregated modeling method of Lévine and Rouchon (1991) to complex models. The column dynamics are approximated by a low number of slow dynamic aggregation stages connected by blocks of steady-state stages. This is achieved by simple manipulation of the left-hand sides of the dif...

متن کامل

Knowledge Distillation for Bilingual Dictionary Induction

Leveraging zero-shot learning to learn mapping functions between vector spaces of different languages is a promising approach to bilingual dictionary induction. However, methods using this approach have not yet achieved high accuracy on the task. In this paper, we propose a bridging approach, where our main contribution is a knowledge distillation training objective. As teachers, rich resource ...

متن کامل

WebChild 2.0 : Fine-Grained Commonsense Knowledge Distillation

Despite important progress in the area of intelligent systems, most such systems still lack commonsense knowledge that appears crucial for enabling smarter, more human-like decisions. In this paper, we present a system based on a series of algorithms to distill fine-grained disambiguated commonsense knowledge from massive amounts of text. Our WebChild 2.0 knowledge base is one of the largest co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app12199453